Logical analysis of data: classification with justification

نویسندگان

  • Endre Boros
  • Yves Crama
  • Peter L. Hammer
  • Toshihide Ibaraki
  • Alexander Kogan
  • Kazuhisa Makino
چکیده

Learning from examples is a frequently arising challenge, with a large number of algorithmsproposed in the classification and data mining literature. The evaluation of the quality ofsuch algorithms is usually carried out ex post, on an experimental basis: their performanceis measured either by cross validation on benchmark data sets, or by clinical trials. None ofthese approaches evaluates directly the learning process ex ante, on its own merits. In thispaper, we discuss a property of rule-based classifiers which we call “justifiability”, and whichfocuses on the type of information extracted from the given training set in order to classifynew observations. We investigate some interesting mathematical properties of justifiableclassifiers. In particular, we establish the existence of justifiable classifiers, and we show thatseveral well-known learning approaches, such as decision trees or nearest neighbor basedmethods, automatically provide justifiable classifiers. We also identify maximal subsets ofobservations which must be classified in the same way by every justifiable classifiers. Finally,we illustrate by a numerical example that using classifiers based on “most justifiable” rulesdoes not seem to lead to overfitting, even though it involves an element of optimization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Support Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran

Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...

متن کامل

Logical selection of potential hub nodes in location of strategic facilities by a hybrid methodology of Data Envelopment Analysis and Analytic Hierarchical Process: Iran Aviation case study

Hub facility location problem looks to find the most appropriate location for deploying such facilities. An important factor in such a problem is the pool of potential locations from which the optimal locations must be selected. The present research was performed to address two key objectives: identifying the factors contributing to the selection locations for hub establishment, and presenting ...

متن کامل

Why Do We Need Justification Logic?

In this paper, we will sketch the basic system of Justification Logic, which is a general logical framework for reasoning about epistemic justification. Justification Logic renders a new, evidence-based foundation for epistemic logic. As a case study, we compare formalizations of the Kripke ‘Red Barn’ scenario in modal epistemic logic and Justification Logic and show here that the latter provid...

متن کامل

TR-2008014: Why Do We Need Justification Logic?

In this paper, we will sketch the basic system of Justification Logic, which is a general logical framework for reasoning about epistemic justification. Justification Logic renders a new, evidence-based foundation for epistemic logic. As a case study, we compare formalizations of the Kripke ‘Red Barn’ scenario in modal epistemic logic and Justification Logic and show here that the latter provid...

متن کامل

Evaluation of Ethical and Unethical Behaviors of Profit Management based on Knowledge Analysis and Fuzzy Network Analysis Model

Background: There is a better way to identify profit management companies, and that is to focus on the motivation and ethical beliefs of managers. Therefore, the purpose of this study is to evaluate the ethical and immoral behaviors of earnings management. Method: The method of conducting the present research in the category of qualitative research is descriptive-survey. The statistical popula...

متن کامل

Interval network data envelopment analysis model for classification of investment companies in the presence of uncertain data

The main purpose of this paper is to propose an approach for performance measurement, classification and ranking the investment companies (ICs) by considering internal structure and uncertainty. In order to reach this goal, the interval network data envelopment analysis (INDEA) models are extended. This model is capable to model two-stage efficiency with intermediate measures i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Annals OR

دوره 188  شماره 

صفحات  -

تاریخ انتشار 2011